Universidade Federal de Viçosa

Programa de pós-graduação em Genética e Melhoramento

Departamento de Biologia Geral




A new look on the genotype-by-environment interaction: enviromics and probabilistic models



Prof. Dr. Kaio Olimpio das Graças Dias

Dr. Saulo Fabrício da Silva Chaves

Outline

  • Genomic Selection
    • Genetic and statistical basis
  • Genotype-by-environment interaction
  • Enviromics (+ Genomics)
    • Reaction norm models
    • Other strategies

Single marker analysis (QTL model)

\[ y_i = \beta_0 + \beta_1 x_i + \epsilon_i \]

  • First model to (try to) link a marker (QTL?) to the phenotype

Credits: Prof. Augusto Garcia

Quantitative Traits

Genetic architecture of drought tolerance in sorghum (Ribaut et al., 2009)

Abundant QTLs and markers

  • Many QTLs for the main commercially important traits
  • Sequencing technology: SNPs along the whole genome

  • Multiple regression?
    • What about the \(p >> n\) problem?
    • What about the curse of dimensionality?
  • Solution: Linear mixed models!

Linkage disequilibrium

Linear mixed models

\[ \begin{bmatrix} \mathbf X' \mathbf X & \mathbf X' \mathbf Z \\ \mathbf Z' \mathbf X & (\mathbf Z'\mathbf Z + \lambda \mathbf K^{-1}) \end{bmatrix} \begin{bmatrix} \mathbf b \\ \mathbf u \end{bmatrix} = \begin{bmatrix} \mathbf X' \mathbf y \\ \mathbf Z' \mathbf y \end{bmatrix} \]

  • RRBLUP \(\rightarrow\) marker effects
  • GBLUP \(\rightarrow\) genomic estimated breeding value (GEBV): VanRaden (2008)
  • Bayesian alphabet (Bayes A, Bayes B, Bayes C\(\pi\), …)

Genomic prediction

  • Large-scale marker-assisted selection

Wallace et al. (2018)

Genomic prediction

  • Changes in the breeder equation

\[ R = \frac{i \times r \times \sigma_g}{L} \]

  • Lower the time between cycles
  • Increase selection intensity
  • Increase selection accuracy
    • What about prediction?
  • Increase genetic variance?

Predictive ability of models

  • k-fold cross-validation

Zhou et al. (2016)

Predictive ability of models?

  • Real-validation

Gezan et al. (2017)

Genotype-by-Environment Interaction

GEI

  • Is a model trained in one environment valid for another?

Multi-environment models

  • Two-stage analysis

\[ \mathbf{y} = \mathbf{1} \mu + \mathbf{X}_1 \mathbf{b} + \mathbf{X}_2 \mathbf{g} + \boldsymbol{\varepsilon} \Rightarrow \mathbf{\bar{y}} = \mathbf{1} \mu + \mathbf{Z} \mathbf{g} + \boldsymbol{\varepsilon} \]

  • Single-stage analysis

\[ \mathbf{y} = \mathbf{1} \mu + \mathbf{X}_1 \mathbf{b} + \mathbf{X}_2 \mathbf{e} + \mathbf{Z}_1 \mathbf{g} + \mathbf{Z}_1 \mathbf{ge} + \boldsymbol{\varepsilon} \]

Covariance structures

Assessing predictive ability in multi-environments

  • CV1: Ability to predict untested individuals
    • A single source of information: covariance between relatives
  • CV2: Sparse-testing scenario (individuals evaluated in some environments, not all)
    • Two sources of information: covariance between relatives and performance in tested environments

Ribeiro et al. (2024)

Enviromics

Looking from another perspective

  • Genomics try to deal with GEI looking at the “genotype” part
  • Enviromics deal with the “environment” part
  • Chracterizing (typing) environments (envirotyping) using environmental features (EF)
  • Specific patterns of pheotype-envirotype interactions

Reaction norm models

  • Baseline model (GBLUP)

\[ \mathbf{y} = \mathbf{1} \mu + \mathbf{g} + \boldsymbol{\varepsilon} \]

with \(\mathbf{g} \sim \mathcal{N}(0, \mathbf{G} \sigma^2_g)\), and \(\mathbf{G} \rightarrow\) VanRaden (2008)

  • Effects of EFs:

\[ \mathbf{y} = \mathbf{1} \mu + \mathbf{Z}_1 \mathbf{g} + \mathbf{Z}_2 \mathbf{w} + \boldsymbol{\varepsilon} \]

with \(\mathbf{w} \sim \mathcal{N}(0, \mathbf{\Omega} \sigma^2_w)\), and \(\mathbf{\Omega} = \frac{\mathbf{W}^\prime \mathbf{W}}{q}\)

How to address marker x EF interaction?

  • Including pair-wise interactions?

\[ \{m_1 w_1, m_1 w_2, ..., m_p w_q \} \]

  • Essentially, kronecker between marker matrix (\(\mathbf{M}\)) and EF matrix (\(\mathbf{W}\))
  • Hundred of EFs, thousands of markers, too many contrasts

  • Using covariance functions:

\[ gw_i = g_i \times w_i \]

with

\[ E(gw_i) = E(g_i) \times E(w_i) = 0 \]

\[ Cov(gw_i, gw_{i^\prime}) = G_{ii^\prime} \times \Omega_{ii^\prime} \]

\[ \mathbf{y} = \mathbf{1} \mu + \mathbf{Z}_1 \mathbf{g} + \mathbf{Z}_2 \mathbf{w} + \mathbf{Z}_3 \mathbf{gw} + \boldsymbol{\varepsilon} \]

with \(\mathbf{gw} \sim \mathcal{N}(\mathbf{0}, \mathbf{G} \# \mathbf{\Omega} \sigma^2_{gw})\)

  • Only realized genotype-by-environmental feature combination

  • Did you detect something off?
  • Dimension of \(\mathbf{G} = v \times v\)
  • Dimension of \(\mathbf{\Omega} = q \times q\)
  • How to make them compatible?
  • Using the incidence matrices!

\[ \mathbf{Z}_1^{(n \times v)} \times \mathbf{G}^{(v \times v)} \times \mathbf{Z}_1^{\prime \, {(v \times n)}} = \mathbf{K}^{(n \times n)} \]

\[ \mathbf{Z}_2^{(n \times v)} \times \mathbf{\Omega}^{(v \times v)} \times \mathbf{Z}_2^{\prime \, {(v \times n)}} = \mathbf{L}^{(n \times n)} \]

\[ \mathbf{L}^{(n \times n)} \# \mathbf{K}^{(n \times n)} \]

  • “Missing environmental heritability”
  • Imperfect LD \(\Rightarrow\) EFs are not enough to characterize environments
  • “Residual” environmental effect: genotype-by-environment interaction
    • Environmental effects not captured by EFs

\[ \mathbf{y} = \mathbf{1} \mu + \mathbf{Z}_1 \mathbf{g} + \mathbf{Z}_2 \mathbf{w} + \mathbf{Z}_3 \mathbf{gw} + \mathbf{Z}_4 \mathbf{e} + \mathbf{Z}_5 \mathbf{ge} + \boldsymbol{\varepsilon} \]

with \(\mathbf{ge} \sim \mathcal{N}(\mathbf{0}, \mathbf{K} \# \mathbf{Z}_4 \mathbf{Z}_4^\prime)\)

Extension

  • GCA, GCAs x ECs, GCAs x environment interaction
  • SCA, SCA x ECs, SCA x environment interaction

Ribeiro et al. (2024)

Assessing predictive ability: LOO

  • CV0: prediction of new environments
  • CV00: prediction of new environments and new genotypes

Ribeiro et al. (2024)

  • Dealing with different sample sizes:

\[ r_w = \frac{\sum_{j=1}^J{\frac{\rho_{\hat{y}\overline{y}{_j}}}{V(\rho_{\hat{y}\overline{y}{_j}})}}}{\sum_{j=1}^J{\frac{1}{V(\rho_{\hat{y}\overline{y}{_j}})}}} \]

Ribeiro et al. (2024)

  • Ability to distinguish the top10

Ribeiro et al. (2024)

Other approaches

Fernandes et al. (2024)

Araújo et al. (2024)

Resende et al. (2024)

References

Araújo, M. S., Chaves, S. F. S., Dias, L. A. S., Ferreira, F. M., Pereira, G. R., Bezerra, A. R. G., Alves, R. S., Heinemann, A. B., Breseghello, F., Carneiro, P. C. S., Krause, M. D., Costa-Neto, G., & Dias, K. O. G. (2024). GIS-FA: An approach to integrating thematic maps, factor-analytic, and envirotyping for cultivar targeting. Theoretical and Applied Genetics, 137(4), 80. https://doi.org/10.1007/s00122-024-04579-z
Fernandes, I. K., Vieira, C. C., Dias, K. O. G., & Fernandes, S. B. (2024). Using machine learning to combine genetic and environmental data for maize grain yield predictions across multi-environment trials. Theoretical and Applied Genetics, 137(8), 189. https://doi.org/10.1007/s00122-024-04687-w
Gezan, S. A., Osorio, L. F., Verma, S., & Whitaker, V. M. (2017). An experimental validation of genomic selection in octoploid strawberry. Horticulture Research, 4, 16070. https://doi.org/10.1038/hortres.2016.70
Resende, R. T., Xavier, A., Silva, P. I. T., Resende, M. P. M., Jarquin, D., & Marcatti, G. E. (2024). GIS-based G × E modeling of maize hybrids through enviromic markers engineering. New Phytologist, n/a(n/a). https://doi.org/10.1111/nph.19951
VanRaden, P. M. (2008). Efficient methods to compute genomic predictions. Journal of Dairy Science, 91(11), 4414–4423. https://doi.org/10.3168/jds.2007-0980
Wallace, J. G., Rodgers-Melnick, E., & Buckler, E. S. (2018). On the road to breeding 4.0: Unraveling the good, the bad, and the boring of crop quantitative genomics. Annual Review of Genetics, 52(1), 421–444. https://doi.org/10.1146/annurev-genet-120116-024846
Zhou, Y., Isabel Vales, M., Wang, A., & Zhang, Z. (2016). Systematic bias of correlation coefficient may explain negative accuracy of genomic prediction. Briefings in Bioinformatics, bbw064. https://doi.org/10.1093/bib/bbw064